skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Althoff, Tim"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Rules are a critical component of the functioning of nearly every online community, yet it is challenging for community moderators to make data-driven decisions about what rules to set for their communities. The connection between a community's rules and how its membership feels about its governance is not well understood. In this work, we conduct the largest-to-date analysis of rules on Reddit, collecting a set of 67,545 unique rules across 5,225 communities which collectively account for more than 67% of all content on Reddit. More than just a point-in-time study, our work measures how communities change their rules over a 5+ year period. We develop a method to classify these rules using a taxonomy of 17 key attributes extended from previous work. We assess what types of rules are most prevalent, how rules are phrased, and how they vary across communities of different types. Using a dataset of communities' discussions about their governance, we are the first to identify the rules most strongly associated with positive community perceptions of governance: rules addressing who participates, how content is formatted and tagged, and rules about commercial activities. We conduct a longitudinal study to quantify the impact of adding new rules to communities, finding that after a rule is added, community perceptions of governance immediately improve, yet this effect diminishes after six months. Our results have important implications for platforms, moderators, and researchers. We make our classification model and rules datasets public to support future research on this topic. 
    more » « less
    Free, publicly-accessible full text available June 7, 2026
  2. Many researchers studying online communities seek to make them better. However, beyond a small set of widely-held values, such as combating misinformation and abuse, determining what `better’ means can be challenging, as community members may disagree, values may be in conflict, and different communities may have differing preferences as a whole.In this work, we present the first study that elicits values directly from members across a diverse set of communities.We survey 212 members of 627 unique subreddits and ask them to describe their values for their communities in their own words. Through iterative categorization of 1,481 responses, we develop and validate a comprehensive taxonomy of community values, consisting of 29 subcategories within nine top-level categories enabling principled, quantitative study of community values by researchers. Using our taxonomy, we reframe existing research problems, such as managing influxes of new members, as tensions between different values, and we identify understudied values, such as those regarding content quality and community size. We call for greater attention to vulnerable community members' values, and we make our codebook public for use in future research. 
    more » « less
  3. This paper presents a mixed-methods study of app-based motorcycle taxis in Thailand to explore the social dynamics of rideshare drivers and their exercised autonomy both through social pressure and a hostile work environment. As motorcycle taxis are open-air vehicles, drivers can be exposed to prolonged air pollution and other weather events, potentially impacting their health. In an initial quantitative study of server-side rideshare logs, we unexpectedly found that drivers do not exercise the autonomy provided by their rideshare platform to avoid air pollution events. This prompted a follow-on investigation through semi-structured interviews of both drivers and passengers in three provinces to explore why these drivers fail to experience the autonomy promised by gig-work in this context and elucidated further examples this lack of autonomy experienced by drivers. Our study sheds light on the social context that may constrain a driver's agency, including financial pressures, weather conditions, conflicts with local taxi organizations, and a false perception that drivers need to work around the ride assignment algorithm to avoid being blacklisted. We find that when leveraging app-based rideshare opportunities, drivers simultaneously perceive increased flexibility in their work hours and a lack of agency to prioritize their health and safety. We conclude with a discussion on potential interventions aimed at mitigating the forces preventing drivers from exercising their autonomy. 
    more » « less
  4. Abstract Neuropsychiatric disorders pose a high societal cost, but their treatment is hindered by lack of objective outcomes and fidelity metrics. AI technologies and specifically Natural Language Processing (NLP) have emerged as tools to study mental health interventions (MHI) at the level of their constituent conversations. However, NLP’s potential to address clinical and research challenges remains unclear. We therefore conducted a pre-registered systematic review of NLP-MHI studies using PRISMA guidelines (osf.io/s52jh) to evaluate their models, clinical applications, and to identify biases and gaps. Candidate studies (n = 19,756), including peer-reviewed AI conference manuscripts, were collected up to January 2023 through PubMed, PsycINFO, Scopus, Google Scholar, and ArXiv. A total of 102 articles were included to investigate their computational characteristics (NLP algorithms, audio features, machine learning pipelines, outcome metrics), clinical characteristics (clinical ground truths, study samples, clinical focus), and limitations. Results indicate a rapid growth of NLP MHI studies since 2019, characterized by increased sample sizes and use of large language models. Digital health platforms were the largest providers of MHI data. Ground truth for supervised learning models was based on clinician ratings (n = 31), patient self-report (n = 29) and annotations by raters (n = 26). Text-based features contributed more to model accuracy than audio markers. Patients’ clinical presentation (n = 34), response to intervention (n = 11), intervention monitoring (n = 20), providers’ characteristics (n = 12), relational dynamics (n = 14), and data preparation (n = 4) were commonly investigated clinical categories. Limitations of reviewed studies included lack of linguistic diversity, limited reproducibility, and population bias. A research framework is developed and validated (NLPxMHI) to assist computational and clinical researchers in addressing the remaining gaps in applying NLP to MHI, with the goal of improving clinical utility, data access, and fairness. 
    more » « less
  5. Multiverse analysis—a paradigm for statistical analysis that considers all combinations of reasonable analysis choices in parallel—promises to improve transparency and reproducibility. Although recent tools help analysts specify multiverse analyses, they remain difficult to use in practice. In this work, we identify debugging as a key barrier due to the latency from running analyses to detecting bugs and the scale of metadata processing needed to diagnose a bug. To address these challenges, we prototype a command-line interface tool, Multiverse Debugger, which helps diagnose bugs in the multiverse and propagate fixes. In a qualitative lab study (n=13), we use Multiverse Debugger as a probe to develop a model of debugging workflows and identify specific challenges, including difficulty in understanding the multiverse’s composition. We conclude with design implications for future multiverse analysis authoring systems. 
    more » « less
  6. Abstract An unhealthy diet is a major risk factor for chronic diseases including cardiovascular disease, type 2 diabetes, and cancer 1–4 . Limited access to healthy food options may contribute to unhealthy diets 5,6 . Studying diets is challenging, typically restricted to small sample sizes, single locations, and non-uniform design across studies, and has led to mixed results on the impact of the food environment 7–23 . Here we leverage smartphones to track diet health, operationalized through the self-reported consumption of fresh fruits and vegetables, fast food and soda, as well as body-mass index status in a country-wide observational study of 1,164,926 U.S. participants (MyFitnessPal app users) and 2.3 billion food entries to study the independent contributions of fast food and grocery store access, income and education to diet health outcomes. This study constitutes the largest nationwide study examining the relationship between the food environment and diet to date. We find that higher access to grocery stores, lower access to fast food, higher income and college education are independently associated with higher consumption of fresh fruits and vegetables, lower consumption of fast food and soda, and lower likelihood of being affected by overweight and obesity. However, these associations vary significantly across zip codes with predominantly Black, Hispanic or white populations. For instance, high grocery store access has a significantly larger association with higher fruit and vegetable consumption in zip codes with predominantly Hispanic populations (7.4% difference) and Black populations (10.2% difference) in contrast to zip codes with predominantly white populations (1.7% difference). Policy targeted at improving food access, income and education may increase healthy eating, but intervention allocation may need to be optimized for specific subpopulations and locations. 
    more » « less